Heuristic Measures of Interestingness
نویسندگان
چکیده
The tuples in a generalized relation (i.e., a summary generated from a database) are unique, and therefore, can be considered to be a population with a structure that can be described by some probability distribution. In this paper, we present and empirically compare sixteen heuristic measures that evaluate the structure of a summary to assign a single real-valued index that represents its interestingness relative to other summaries generated from the same database. The heuristics are based upon well-known measures of diversity, dispersion, dominance, and inequality used in several areas of the physical, social, ecological, management, information, and computer sciences. Their use for ranking summaries generated from databases is a new application area. All sixteen heuristics rank less complex summaries (i.e., those with few tuples and/or few non-ANY attributes) as most interesting. We demonstrate that for sample data sets, the order in which some of the measures rank summaries is highly correlated.
منابع مشابه
Ranking the Interestingness of Summaries from Data Mining Systems
We study data rn~rdng where the task is description by summarization, the representation language is generalized relations, the evaluation criteria are based on heuristic measures of interestingness, and the method for searching is the Multi-Attribute Generalization algorithm for domain generalization graphs. We present and empirically compare four heuristics for ranking the interestingness of ...
متن کاملHeuristic for Ranking the Interestigness of Discovered Knowledge
We describe heuristics, based upon information theory and statistics, for ranking the interestingness of summaries generated from databases. The tuples in a summary are unique, and therefore, can be considered to be a population described by some probability distribution. The four interestingness measures presented here are based upon common measures of diversity of a population: variance, the ...
متن کاملAssessing the Interestingness of Discovered Knowledge Using a Principled Objective Approach
When mining a large database, the number of patterns discovered can easily exceed the capabilities of a human user to identify interesting results. To address this problem, various techniques have been suggested to reduce and/or order the patterns prior to presenting them to the user. In this paper, our focus is on ranking summaries generated from a single dataset, where attributes can be gener...
متن کاملSelecting Perfect Interestingness Measures by coefficient of variation based Ranking Algorithm
Ranking interestingness measure is an active and essential research domain in the process of knowledge discovery from the extracted rules. Since various measures proposed by many researchers in various situations increases the list of measures and these are not able to use as a common measures to evaluate the rules, knowledge finders are not able to identify a perfect measure to ensure the actu...
متن کاملDRP Report: Quantitative Association Mining From Bottom Up and Heuristic Search Perspectives
The traditional association mining focuses on discovering frequent patterns from the categorical data, such as the supermarket transaction data. The quantitative association mining (QAM) is a nature extension of the traditional association mining. It refers to the task of discovering association rules from quantitative data instead of from categorical data. The discrepancies between the two typ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999